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ABSTRACT A strategy based on the gene trap was devel- 
oped to prescreen mouse embryonic stem cells for insertional 
mutations in genes encoding secreted and membrane- 
spanning proteins. The "secretory trap" relies on capturing 
the N-terminal signal sequence of an endogenous gene to 
generate an active /3-galactosidase fusion protein. Insertions 
were found in a cadherin gene, an u«ctf-related laminin 
(netrin) gene, the sek receptor tyrosine kinase gene, and genes 
encoding two receptor-linked protein-tyrosine phosphatases, 
LAR and PTPk.' Analysis of homozygous mice carrying in- 
sertions in LAR and PTPk showed that both genes were 
effectively disrupted, but neither was essential for normal 
embryonic development. 

The identification and mutation of secreted and transmem- 
brane proteins expressed during early mouse embryogenesis is 
a prerequisite for understanding cell-cell interactions required 
for mammalian development. Expression cloning methods to 
isolate embryonic cDNAs encoding this class of proteins (1, 2) 
are technically demanding and potentially biased in favor of 
smaller, abundantly transcribed mRNAs. Gene trapping in 
mouse embryonic stem (ES) cells offers a rapid, but essentially 
random, method to identify and simultaneously mutate genes 
expressed during mouse development (3). Because gene trap- 
ping relies on random insertion into the genome of cells, the 
detection of genes should not be influenced by the relative 
abundance of transcripts in ES cells. However, it is anticipated 
that genes composed of large introns will be more readily 
detected by gene trap vectors as they present a larger target for 
ins ertion.. ^ _ __ ___ . ^ 

Conventional gene trap vectors contain a splice acceptor 
sequence linked to the lacZ or ^geo reporter gene (4-8); the 
latter is a /acZ-neomycin phosphotransferase fusion gene. 
When these vectors integrate within the introns of genes, 
/3-galactosidase (/3-gal) fusion proteins are produced that 
include the N terminus of the endogenous gene present at the 
site of insertion. j3-gal enzyme activity in cell lines transfected 
with gene trap constructs has been observed in a variety of 
subcellular locations (8, 9), presumably reflecting the acqui- 
sition of endogenous protein domains that act to sort the fusion 
protein to different intracellular compartments. Here, we have 
exploited the differential sorting of j3-gal fusion proteins as 
a means to capture genes encoding N-terminal signal se- 
quences, genes therefore likely to be expressed on the cell 
surface. 

MATERIALS AND METHODS 

Vectors. The jSgeo reporter in all vectors was obtained from 
pGTl.Sgeo by replacing the Cla I (unique in lacZ)/Sph I 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement** In 
accordance with 18 U.S.C §1734 solely to indicate this fact. 



(unique in neo) fragment of the gene trap vector pGTl.8 with 
the Cla l/Sph I fragment of pSApgeo (7). pGTl.8 is a 
derivative of pGT4.5 (4) where the 3'* En2 sequences were 
replaced with the 0.2-kb Bd I/BamHI simian virus 40 poly(A) 
signal (10). The parental vector pActjSgeo contains the 0.5-kb 
human /3-actin promoter (10) linked to the jSgeo/simian virus 
. 40 poly(A) cassette. The start of Pgeo translation was engi- 
neered to contain a Kozak consensus sequence with unique Sal 
I and Nru I sites on either side for generating subsequent 
fusions (SDK oligonucleotide: S'-GTCGACCTGCAGGTCG- 
GAGGCCACCATGGCTCGCGAT, from S. Dariing, Medi- 
cal Research Council Mammalian Development Unit, Lon- 
don). Sal I sites were placed at each end of a Bal I fragment 
containing the entire coding region of the rat CD4 cDNA (11). 
A 0.45-kb Sal l/Kpn I fragment containing the N-terminal 
cleavable signal sequence (SS) of CD4 or a 1.4-kb Sal l/Nde I 
fragment containing the entire CD4 coding region was cloned 
into Sal l/Nru I-digested pAct^geo to generate pActSS^geo 
and pActSSTMpgeo, respectively. The secretory trap vector 
pGTl.STM includes the 0.7-kb Pst l/Nde I fragment of Cp4 
containing the transmembrane domain (TM) inserted in- 
frame with /3geo in pGT1.8geo. 

ES Cell Culture. CGR8 ES cells (a feeder-independent cell 
line derived from strain 129/Ola mice by J. Nichols; ref. 12) 
were maintained in Glasgow MEM/BHK12 medium contain- 
ing 0.23% sodium bicarbonate, ix MEM essential amino 
acids, 2 mM glutamine, 1 mM pyruvate, 50 /xM 2-mercapto- 
ethanoi, 10% (vol/vol) fetal calf serum (Globepharm, Surrey, 
U.K.), and 100 units of differentiation-inducing activity/ 
leukemia-inhibitory factor per ml. Transiently transfected cells 
were obtained^by electroporating lOLES cells with 100>g pf 
uncut plasmid DNA in a volume of 0.8 ml of PBS by using a 
Bio-Rad Gene Pulser set at 250 p,F/250 V and cultured for 36 
hr on gelatinized coverslips prior to analysis. To obtain stable 
cell lines, between 5 X 10' and 10^ CGR8 ES cells were mfaced 
with 150 fig of linearized plasmid DNA and electroporated at 3 
jLLF/800 V. Cells (5 X 10^) were plated on 10-cm dishes and 
selected in the presence of Geneticin (GIBCO) at 200 tig/ml To 
assay j3-gal enzyme activity and protein, ES cells were grown on 
gelatinized coverslips and stained with 5-bromo-4-chloro-3- 
indolyl p-D-galactoside (X-Gal) (13) or with polyclonal rabbit 
anti-p-gal antiserum (a gift from J. Price, National Institute of 
Medical Research) and fluorescein isothiocyanate-conjugated 
donkey anti-rabbit IgG (Jackson ImmunoResearch) (14). To 
permeabilize membranes, cells were treated with 0.5% Nonidet 
P-40 prior to antibody staining. 

RNA Analysis and Rapid Amplification of cDNA Ends 
(RACE) Cloning. Northern blots and 5' RACE cloning were 

Abbreviations: ES. embryonic stem; 0-gaI, p-galactosidase; SS, N- 
terminal cleavable signal sequence; TM, transmembrane domain; 
X-Gal, 5-bromo-4-chloro-3-indoiyl /3-D-gaIactoside. 
tTo whom reprint requests should be addressed at: Centre for Genome 

Research, University of Edinburgh, King's Buildings, West Mams 

Road, Edinburgh EH9 3JQ, U.K. 
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carried out as described (8). Probes containing the cytoplasmic 
region and 3' untranslated region of the genes encoding the 
protein-tyrosine phosphatases LAR and PTPk were gener- 
ously provided by B. Goldstein (Thomas Jefferson University, 
Philadelphia) and J. Sap (New York University Medical 
Center, New York). Several modifications were incorporated 
into the 5' RACE procedure: (/) microdialysis (0.025-/im 
filters; Millipore) was used in place of ethanol precipitation, 
(ii) nested PGR (30 cycles each) was carried out using an 
anchor primer (5'-GGTTGTGAGCTCTTCTAGATGG) and 
a primer specific to CD4 (5'-AGTAGACTTCTGCACAGA- 
CACC) followed by size selection on agarose gels and a second 
round of PGR with the anchor and the £/i-2 256 (8) primers, 
and (Hi) Chromo Spin 400 columns (Clontech) were used to 
size select A2>fl I/ATp/t-digested PGR products prior to cloning. 

ES Cell Chimeras. Chimeric embryos and germ-line mice 
were generated by injection of C57BL/6 blastocysts (8). Em- 
bryos at the appropriate stages were dissected, fixed, and 
stained with X-Gal (13). 

RESULTS AND DISCUSSION 

To test if /3-gal fusions that contain an SS could be identified 
by their subcellular distribution, vectors were constructed to 
express portions of the CD4 type I membrane protein (11) 
fused to Pgeo, a chimeric protein that possesses both ^-gal and 
neomycin phosphotransferase activities (7) (Fig, L4). Pgeo 
fused to the signal sequence of CD4 (pActSSjSgeo) accumu- 



lated in the endoplasmic reticulum (ER) but lacked p-gal 
activity (Fig. 2 C and Z>). Therefore, translocation of j3geo into 
the lumen of the ER appeared to abolish jS-gal enzyme 
function. P-gal activity was restored by including the TM of 
CD4 (pActSSTMjSgeo) (Fig. 2 E and f), presumably by 
keeping p-gal in the cytosol. Active protein was localized in the 
ER and in multiple cytoplasmic inclusions, a pattern only 
rarely observed in ES colonies obtained with the conventional 
gene trap vector, probably because insertions downstream of 
both a signal sequence and TM of genes encoding membrane- 
spanning proteins are infrequent. Therefore, to identify inser- 
tions in both secreted and type I membrane proteins, our gene 
trap vector pGTl.Sgeo was modified to include the TM of CD4 
upstream of jSgeo (Fig. IB). With the secretory trap vector 
pGT1.8TM we would now expect to restore p-gal enzyme 
activity to any insertion occurring downstream of a signal 
sequence. 

In a pilot experiment, the relative efficiency of our gene trap 
vector was compared to the original pSA^geo (7) after elec- 
troporation into ES cells (Fig. 15). Although pSA)3geo con- 
tains a start of translation that is absent in our vectors, fewer 
G418-resistant colonies were obtained with pSAjSgeo than 
with pGTl.Sgeo. More importantly, nearly all the colonies 
derived with pSAjSgeo showed high levels of P-gal activity, 
whereas our vector showed a broad range of staining intensities 
and a greater proportion of j3-gal-negative colonies. Sequence 
analysis of the pSAjSgeo vector revealed a point mutation in 
neo known to reduce its enzyme activity (15). Therefore, the 
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Fig. 1. The secretory trap vector shows selective 
activation of the i3-gal e'nzyme in fusions that capture 
an SS. (/I) CD4-j3geo expression constructs. The 
human /3-actin (Act) promoter was used to drive 
expression of /3geo alone (pAct/3geo), fused in frame 
with the SS of CD4 (pActSS/3geo), or fused to the SS 
and TM of CD4 (pActSSTM^geo). Results of tran- 
sient transfection experiments shown in Fig. 2 are 
summarized on the right. (5) Relative efficiency of 
gene trap and secretory trap vectors in ES Cells. 
pSA/3geo (7) contains the minimal adenovirus type 
2 major late splice acceptor (SA, arrowhead; open 
hnx, intron; shaded box, exon) and the bovine g row th 
hormone polyadenylylation signal (pA). The muta- 
tion in neo (*) present in pSA/3geo was corrected in 
our vectors by replacement of the Cla I {C)/Sph I (S) 
fragment of ^geo. pGTl.Sgeo and pGTl.STM con- 
tain the mouse E/i-2 splice acceptor and simian virus 
40 polyadenylylation signal and lack a translation 
initiation signal (ATG). Vectors were linearized 
prior to electroporation at either the Sea I (Sc) site 
of the plasmid backbone (represented by the line) of 
pSAPgeo or at the Hindlll (H) site at the 5' end of 
the En-2 introh. The number of G418-resistant 
(G418^) colonies obtained in the electroporation of 
5 X 10^ (experiment 1) or 10^ (experiment 2) cells 
and the proportion that express detectable ^-gal 
activity are indicated on the right. (C) Model for the 
selective activation of ^-gal in the secretory trap 
vector. Insertion of pGTl.STM in genes that contain 
an SS produces ^geo fusion proteins that are inserted 
in the membrane of the endoplasmic reticulum in a 
type I configuration. The TM of the vector retains 
0geo in the cytosol where p-gal remains active. 
Insertion of the vector in genes that lack a signal 
sequence produces fusion proteins with an internal 
TM domain. Insertion of these proteins in a type II 
orientation exposes pgeo to the lumen of the ER 
where /3-gal activity is lost. 
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Fig. 2. Localization of /3geo fusion protein and enzyme activity in ES cells. (A-F) Transient transfection of ES cells with pAct^geo (A and B), 
pActSS^geo (C and D), and pActSSTM^geo {E and F) and assay for /3-gal activity (X-Gal; bright field) and protein (immunofluorescence; dark 
field), ^geo alone was evenly distributed in the cytoplasm of cells, pgeo fused to the SS of CD4 accumulated in the ER, resulting in the loss of 
j3-gal activity. |3-gal activity was restored in fusions that contain both the SS and TM of CD4, and /3geo localizes to the £R and in multiple cytoplasmic 
inclusions. (G and H) Stable cell lines transfected with pGTl.STM showing constitutive j3-gal activity (G) or activity induced in a subset of 
differentiated cell types (H) in cultures of ST534 and ST519 cells, respectively (I-L). Detection of jSgeo protein in 0-gal-positive (/ and J) and 
)3-gal-negative (K and L) cell lines in permeabilized (/ and K) and nonpermabilized (7 and L) cells. In j3-gal-negative cells, the )3geo fusion was 
detected on the surface of j3-gal-negative cells in the absence of membrane permeabilization, indicating that the loss of /3-gal activity correlated 
with a type II orientation of the fusion protein (see the model in Fig. IC). 



pSApgeo vector appears to preselect for genes expressed at 
high levels, and correction of the neo mutation in our vectors 
now allows access to genes expressed at low levels (see 

b^lpw). ■ ..^..^ .. . 

Approximately half of the pGTl,8geo colonies express 
detectable /3-gal activity and show the various subcellular 
patterns of p-gal staining observed previously. In contrast, 
^only"20%=of"the""pGTlT8TM^cdloriies'"exp^^^ 
(Fig. IB), and all display the "secretory" pattern of Pgeo 
activity characteristic of the pActSSTM/3geo fusion (compare 
Fig. 2E with Fig. 2 G and H). The reduction in the proportion 
of ^-gal-positive colonies and the singular pattern of j3-gal 
staining observed with the secretory trap vector suggested that 
P-gal, but not neomycin phosphotransferase activity, is lost in 
fusions with proteins that do not possess a signal sequence. 
Loss of j3-gal activity would be predicted to occur if fusions that 
lack a SS were inserted into the membrane in a type II 
orientation placing )3geo in the lumen of the ER. To confirm 
this, several p-gal-negative cell lines were isolated and ana- 
lyzed by immunofluorescence. In these lines, the fusion protein 
was detected on the surface of cells in the absence of detergent 
permeabilization (Fig. 2 K and L), indicating a type II orien- 
tation of the )3geo fusion protein. In contrast, detergent 
permeabilization was essential to detect the fusion protein in 
jS-gal-positive cell lines (Fig. 2 / and 7), as would be expected 
for type I membrane proteins. From these data, we propose a 
model to explain the selective activation of ^-gal in pGTl.STM 
(Fig. IC). In the absence of a SS, the TM of the fusion protein 
acts as a signal anchor sequence (16) to place jSgeo in a type 
II orientation, exposing /3geo to the lumen of the ER where 



)3-gal activity is lost. In fusions that contain a SS, the TM in the 
vector acts to prevent pgeo from entering the ER lumen, 
thereby preserving its cytosolic enzyme activity. 

5 ' RACEi l7).was used to clone a portion of the endogenous^ 
gene associated with secretory trap insertions that express 
detectable /3-gal activity (Table 1). Northern and RNA dotblot 
analysis showed that approximately one-half (5 of 11 analyzed 
"in^this"stu~dy)'TDfnhe^G418=fesistantxell=lin 
utilize the splice acceptor and produce fusion transcripts that 
hybridize within intron sequences of the vector (data not 
shown). These insertions presumably do not represent true 
gene trap events and thus were not analyzed further. Northern 
blot analysis of six properly spliced lines (ST484, ST497. ST514, 
ST519, ST531, and ST534) detected a unique-sized /3geo fusion 
transcript in each cell line (Fig. 3A), At least two independent 
RACE cDNAs were cloned from each cell line. The cDNAs 
obtained from all cell lines except ST514 detected both the 
fusion transcript and an endogenous transcript common to 
all cell lines as shown for the ST534 probe (Fig. 3/4). The 
ST514 insertion illustrates that genes expressed at very low 
levels in ES cells can be trapped. In ST514 cultures, ^-gal 
activity was observed only in a few differentiated cells, and 
accordingly neither the fusion nor the endogenous tran- 
scripts could be detected on Northern blots (Fig, 3A and data 
not shown). 

Sequence analysis of the RACE cDNAs in all cases showed 
the proper use of the splice acceptor and a single open reading 
frame in-frame with pgeo. One insertion occurred in netrin, a 
gene homologous to the unc-6 gene of Caenorhabditis elegans 
(18) recently cloned in the chicken (19). The remaining five 
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Cell 
line 



j3-gal expression* 



Transcript si2e,t kb 



ES 



Diff. 



Fusion 



Endogenous 



Gene* 



PhenotypeS (wt:het:hom) 



484 


+ 


+ 


7.5 


7.5 


LAR (1833) 


NA 




497 


+ 


+/- 


6.5 


7 


sek (1376) 


? 




514 




+/- 


ND 


ND 


Netrin (4721) 


NA 




519 




+/- 


>12 


>12 


Novel cadherin 


NA 




531 




+/- 


6.1 


5.3 


PTPk (2000) 


Viable 


(36:57:27) 


534 


+ 


+ 


6.0 


7.5 


LAR (706) 


Viable 


(36:79:25) 



*Based on X-Gal staining of ES cell cultures that contain a subset of spontaneously differentiated (Diff.) cell types. +/- 
indicates expression in a subset of differentiated cell types. 

iTranscript sizes were determined from Northern blots (Fig. 3 and data not shown). ND, not detected. 

^Numbers in parentheses indicate the insertion site within the endogenous gene based on the nucleotide sequence of rat LAR 
(GenBank accession no. L11586), mouse JcJfc (S51422), chicken netrin 1 (L34549). and mouse PTPk (L10106). The GenBank 
accession numbers for the mouse netrin and cadherin genes are U23505 and U23536, respectively. 

^Based on the recovery of homozygous (hom) animals at weaning age in litters from heterozygous (het) intercrosses. ?, 
Phenotype unknown, breeding in progress. NA, not applicable, insertion not in germ line, wt. Wild type. 



insertions interrupted the extracellular domains of genes en- 
coding membrane-spanning proteins: a cadherin most closely 
related to the protein encoded by fat tumor suppressor gene of 
Drosophila (20), the fc/:-encoded receptor tyrosine kinase (21), 
the receptor-linked protein-tyrosine phosphatase PTPk 
(22), and two independent insertions in a second receptor- 
linked protein-tyrosine phosphatase LAR (23). These results 
support the prediction that jS-gal activity is dependent on 
acquiring a SS from the endogenous gene at the site of 
insertion. 

Reporter gene activity associated with each insertion was 
analyzed in embryos (Fig. 4). The pattern of )3-gal expression 
in embryos derived from insertions in the sek (ST497) and 
netrin (ST514) genes was very similar to published RNA in situ 
results for the mouse sek (24) and chicken netrin (25) genes, 
which provides further evidence that gene trap vectors accu- 
rately report the pattern of endogenous gene expression (8). 
Both insertions in LAR (ST484 and ST534) exhibited weak, 
widespread expression in 8.5-day embryos. The insertion in 
FTPk (ST531) showed /3-gal expression in endoderm and 
paraxial mesoderm; the highest expression was observed in 
newly condensing somites. j3-gal expression in tissues of adult 
mice carrying insertions in LAR and PTPk correlated well 
with known sites of mRNA expression (22, 26). The highest 
levels of 0-gal activity were found in the lung, mammary gland, 
and brain of ST534 (LAR) mice and in the kidney, brain, and 
liver of ST531 (PTPk) mice (data not shown). 

.ES cell linesxontaining, insertions. in.the LAR, PTPk, and- 
sek genes have been transmitted to the germ line of mice. Thus 
far, breeding analysis showed that mice homozygous for the 
LAR and PTPk insertions are viable and fertile. To confirm 
that the LAR and PTPk genes were effectively disrupted. 



Northern blots of RNA from wild-type and homozygous adult 
tissues were probed with cDNAs from regions downstream of 
each insertion site. For both mutations, full-length transcripts 
were not detected in homozygous animals (Fig. 3B). Because 
secretory trap insertions generate fusions that in some cases 
will contain a large portion of the extracellular domain of the 
target gene, the production of both loss of function and gain 
of function (i.e., dominant-negative) mutations is possible. 
However, since the ^geo fusions with LAR and PTPk include 
<300 amino acids of the extracellular domains of these pro- 
teins, these insertions likely represent null mutations. LAR 
and PTPk are members of an ever-increasing family of recep- 
tor protein-tyrosine phosphatase gene (27). Therefore, the 
absence of overt phenotypes in LAR and PTPk mutant mice 
is likely due to functional overlap between gene family 
members as has been observed with targeted mutations in 
multiple members of the myogenic and Src family genes 
(28-30). 

In summary, we have modified the gene trap vector to detect 
insertions in genes that code for cell surface proteins, a class 
of genes previously missed by conventional gene trap vector. 
The strategy relies on the retention of jS-gal enzyme activity in 
reporter gene fusions that acquire a SS (Fig. IC), thus pro- 
viding a simple selective assay to recover insertional mutations 
in secreted and type I membrane proteins. The vector was 
demonstrated to detect genes expressed at very low levels in ES 
cells; therefore, it should be possible to develop screens to 
identify lineage:rSpecific genes Jnduced.upon ES cell differen- 
tiation. A screen for cell surface genes expressed in restricted 
patterns in the early embryo should help to identify key 
regulators of cell-cell interactions required for the establish- 
ment of patterning of the primary embryonic germ layers. 




Fig. 3. RNA analysis of secretory trap inser- 
tions. (4) Northern blot of 15 ^g of ES cell RNA 
hybridized with the lacZ gene and reprobed with 
a RACE cDNA fragment cloned from the ST534 
(LAR) insertion. The pGTl.STM vector is pre- 
dicted to contribute 5 kb to the size of the fusion 
transcript. As expected, the ST534 RACE clone 
detects both the 6-kb fusion transcript in ST534 
cells and the 7.5-kb endogenous LAR transcript 
in all cell lines. (B) Northern blots of 10 ^ig of 
RNA from wild-type (+/+). heterozygous (+/ 
-), and homozygous (-/-) lung of ST534 
(LAR) and kidney of ST531 (PTPk) adult mice 
hybridized with LAR and PTPk cDNA se- 
1 2 quences 3' to the insertion and reprobed with the 
ribosomal S12 gene (rpsl2) as a loading control. 
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Furthermore, this approach should be generally applicable to 
other cultured cell lines. 

Based on the first six genes identified, the secretory trap 
shows a preference for large membrane-spanning receptors. 
The recovery of two independent insertions in LAR further 
suggests that the current vector design will access a restricted 
class of genes. The requirement for gene trap vectors to insert 
in introns of genes is predicted to impose an inherent bias in 
favor of detecting genes composed of large intronic regions 
and consequently limit the number of genes accessible with this 
approach. To access a larger pool of genes, we have con- 
structed vectors in each of the three possible reading frames. 
Furthermore^ to recover insertions in smaller- transcription 
units composed of few or no introns, we developed an "exon 
trap" version of the vector that lacks a splice acceptor. Each 
vectQr yielded similar^numbers jDf ,GA18^^^ 
similar proportion of which exhibit the secretory pattern of 
)3-gal activity (W.C.S. and J. Brennan, unpublished results). 
With a combination of vectors, we now expect to obtain a more 
representative sampling of the genome that should include 
both membrane receptors and secreted Hgands. 
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